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INTRODUCTION: 


Prostate  cancer  is  the  most  common  cancer  diagnosed  in  males  in  the  developed  world. 

While  risk  factors  suggest  a  genetic  basis  for  the  disease,  the  search  for  causal  genes  has  yielded 
few  results.  In  the  last  decade,  genome -wide  association  studies  (GWAS)  have  greatly  helped  in 
the  identification  of  common  risk  variants  associated  with  complex  diseases  such  as  cancer; 
routinely,  these  associated  polymorphisms  are  located  within  gene  deserts  and  other  type  of  non¬ 
coding  DNA  (7).  A  striking  example  of  GWAS  implicating  non-coding  variants  in  the  etiology 
of  cancer  can  be  seen  on  chromosome  8q24,  where  numerous  studies  have  reported  associations 
between  prostate,  colorectal,  breast  and  urinary  bladder  cancer  and  variants  concentrated  within  a 
1.2Mb  gene  desert  (2-12).  Evidence  for  prostate  cancer  association  is  particularly  strong,  with 
five  distinct  linkage  disequilibrium  (LD)  blocks  spanning  a  440Kb  interval  harboring  risk 
variants.  Although  there  are  no  well-characterized  genes  within  the  interval,  the  proto-oncogene 
MYC  lies  just  downstream  of  the  gene  desert,  raising  the  possibility  that  the  associated  risk 
regions  may  harbor  long-range  c/.s-regulatory  elements  -  such  as  enhancers  -  involved  in  the 
tissue-specific  transcriptional  regulation  of  MYC.  Under  this  hypothesis,  each  distinct  prostate 
cancer  association  interval  would  contain  a  functional  element  involved  in  regulating  MYC 
expression  in  the  prostate.  The  purpose  of  this  proposal  is  to  identify  and  characterize  prostate 
enhancers  within  the  prostate  cancer  associated  intervals  on  8q24  using  a  combination  of  in  vivo 
and  in  vitro  reporter  assays. 


BODY: 

In  the  course  of  the  past  year,  we  have  located  and  characterized  a  prostate  enhancer  that 
encompasses  the  prostate  cancer  associated  SNP  rs6983267.  Furthermore,  we  have 
demonstrated  that  this  enhancer  element  exhibits  allele-specific  in  vivo  enhancer  activity  in 
developing  and  mature  mouse  prostates  (13)  (see  Appendix  for  full  publication).  These 
achievements  support  the  hypothesis  put  forth  in  my  initial  proposal  and  represent  significant 
progress  towards  the  execution  of  my  research  Aims.  The  following  section  addresses  each 
specific  task  within  the  Statement  of  Work  by  detailing  the  progress  and  accomplishments  of  the 
last  year. 

Task  la:  Perform  in  situ  hybridization  for  all  genes  within  a  1.0Mb  interval  surrounding  the 
prostate  cancer-associated  region. 

Because  of  its  status  as  an  ideal  positional  and  functional  target  gene,  we  began  our  in  situ 
hybridizations  by  assessing  Myc  expression  in  the  genitourinary  apparatus  of  male  mice. 
Digoxigenin-labeled  Myc  antisense  and  sense  riboprobes  were  generated  from  a  full-length 
mouse  Myc  cDNA  clone.  Staining  was  performed  on  whole  P8  and  P2 1  mouse  prostates  for  48 
hours.  As  expected,  we  observed  Myc  expression  in  the  developing  and  mature  prostate,  as  well 
as  in  the  coagulating  glands,  seminal  vesicles,  and  ductus  deferens  (13)  (Appendix,  Figure  2  of 
paper).  This  expression  correlated  very  well  with  the  reporter  gene  expression  pattern  driven  by 
the  rs6983267-containing  enhancer  we  identified  (described  below). 

While  the  parallel  between  the  enhancer’s  domain  and  Myc  expression  in  the  prostate  is 
compelling,  it  does  not  directly  show  that  MYC  is  the  target  gene  for  the  8q24  cA -regulatory 
element.  In  theory,  other  genes  within  the  enhancer’s  potential  range  of  influence  -  including 
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FAM84B,  P0U5F1P1,  and  PVT1  -  could  also  exhibit  prostate  expression  and  be  the  target  for 
the  element’s  regulatory  influence.  However,  work  published  since  the  submission  of  this 
proposal  has  convincingly  demonstrated  that  the  8q24  prostate  cancer  associated  regions 
(including  the  rs6983267-containing  enhancer  we  describe)  physically  interact  with  the  MYC 
promoter  in  prostate  cancer  cell  lines  (14-16).  These  studies  all  employed  chromosomal 
conformation  capture  (3C),  a  technique  that  assesses  whether  a  specific  fragment  can  loop  over 
large  genomic  distances  to  physically  connect  with  another  DNA  region  (17).  This  direct  link 
between  the  prostate  cancer  associated  regions  and  the  MYC  promoter  -located  between  250kb 
and  650kb  away  -  shows  that  MYC  is  the  target  gene  of  the  cancer  associated  c/.s-regulatory 
regions.  This  obviates  the  need  to  investigate  the  expression  patterns  of  other  nearby  genes. 

Task  lb,  lc  and  Id:  Use  progressively  smaller  DNA  fragments  in  in  vivo  mouse  transgenic 
assays  to  identify  and  localize  enhancers  within  the  8q24  region. 

To  initially  examine  the  8q24  gene  desert  for  regulatory  elements,  we  surveyed  the  region 
using  a  broad-scale  BAC  scan  approach.  We  identified  three  overlapping  human  BACs 
encompassing  the  prostate  cancer  risk  regions  (CTD-2506D10,  RP1 1-124F15,  and  CTD- 
2533C10),  which  together  span  480kb  of  non-coding  DNA  (Appendix,  Figure  2  of  paper).  Each 
BAC  carried  the  prostate  cancer-associated  risk  haplotype  and  was  tagged  through  a  Tn7 
transposon-mediated  random  insertion  of  a  P-galactosidase  (lacZ)  gene  driven  by  a  (i-globin 
minimal  promoter  (18).  The  lacZ  cassette  integration  converts  the  BACs  into  enhancer  trapping 
systems,  whereby  any  long-range  enhancer(s)  contained  within  each  ~180kb  BAC  can  act  upon 
the  reporter  gene  to  drive  tissue-  and  temporal-specific  P-galactosidase  expression.  The  design 
of  overlapping  BACs  aids  in  the  efficiency  of  the  system  to  narrow  the  critical  region  of  interest, 
as  expression  profiles  unique  to  only  one  BAC  must  be  due  to  uniquely  contained  sequences; 
conversely,  identical  expression  patterns  present  in  overlapping  BACs  suggest  that  the  functional 
element  driving  P-galactosidase  expression  must  be  contained  in  the  shared  genomic  region.  A 
detailed  account  of  these  experiments  can  be  found  in  the  attached  manuscript  (Appendix, 

Results  and  Methods). 

The  in  vivo  BAC  transgenic  reporter  assays  identified  prostate  enhancer  activity  contained 
within  the  8q24  gene  desert  (Appendix,  Figure  1  of  paper).  While  we  did  not  observe  P- 
galactosidase  prostate  expression  in  BAC  CTD-2506D10  transgenic  mice  (12  independent 
transgenics),  animals  harboring  BACs  CTD-2533C10  and  RP1 1-124F15  displayed  P- 
galactosidase  prostate  expression  at  days  PO,  P8  and  P21  (13).  Because  of  the  highly  similar 
reporter  expression  patterns  obtained  from  BACs  RP1 1-124F15  and  CTD-2533C10,  including 
prostate,  coagulating  gland,  and  urethral/bladder  lining,  we  hypothesized  that  our  BAC 
transgenic  assays  were  identifying  a  single  prostate  enhancer  within  the  59kb  shared  genomic 
segment  of  these  two  BACs.  Interestingly,  one  of  the  most  strongly  associated  prostate  cancer 
risk  SNPs,  rs6983267,  is  contained  within  this  59kb  overlapping  interval  and  disrupts  an 
evolutionarily  conserved  sequence  (Appendix,  Figure  1  of  paper). 

Rather  than  using  fosmids  as  an  intermediate  means  to  localize  the  putative  prostate  enhancer 
element,  we  directly  tested  the  rs6983267-containing  evolutionarily  conserved  element  for 
regulatory  potential  in  vivo.  A  5kb  DNA  fragment  containing  each  allele  of  this  SNP  was  cloned 
into  a  lacZ  reporter  cassette  using  Invitrogen’s  Gateway  cloning  system  and  transgenic  mice 
harboring  either  the  risk  or  the  non-risk  variant  of  rs6983267  were  generated  and  analyzed.  We 
detennined  that  the  conserved  sequence  containing  the  prostate  cancer  GWAS  SNP  displayed 
allele-specific  in  vivo  prostate  enhancer  properties)  13)  (Appendix,  Figure  2  of  paper). 
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Specifically,  the  risk  allele,  rs6983267-G,  led  to  consistent,  stronger  P-galactosidase  expression 
in  prostates  and  coagulating  glands  than  the  non-risk  allele,  rs6983267-T,  in  PO,  P8  and  P21 
transgenic  mice  (Appendix,  Figures  2  and  3).  The  expression  pattern  driven  by  the  rs6983267-G 
risk  allele  in  3  independent  mouse  transgenic  lines  closely  resembled  that  observed  in  BACs 
RP1 1-124F15  and  CTD-2533C10  -  both  of  which  also  harbor  the  risk  allele.  In  contrast,  the 
rs6983267-T  non-risk  allele  led  to  weakened  prostate  and  coagulating  gland  expression  in  3 
independent  transgenic  lines.  For  each  allelic  variant  evaluated,  those  transgenic  founders 
exhibiting  enhancer  activity  showed  highly  concordant  P-galactosidase  expression  in  the 
prostate,  with  a  clear  qualitative  difference  between  the  risk  and  non-risk  variants. 

These  results  demonstrate  that  our  BAC-based  enhancer  trapping  screen  is  a  powerful 
resource  to  rapidly  uncover  c/.s-regulatory  regions  in  large  DNA  segments.  We  plan  to  use  this 
screening  tool  to  further  refine  our  findings,  uncovering  other  prostate-specific  regulatory 
elements  in  this  locus.  As  other  non-coding  SNPs  within  this  8q24  gene  desert  are  associated 
with  increased  prostate  cancer  risk  -  independently  of  the  SNP  we  already  characterized  -  we 
anticipate  discovering  other  prostate  enhancers  with  allele-specific  function  upstream  of  MYC. 

Task  2a:  Construct  a  reporter  plasmid  for  lucif erase  assay  analysis  in  prostate  cancer  cell  lines. 

Two  luciferase  assay  report  plasmids  have  been  constructed  for  the  quantitative  analysis  of 
enhancer  potential  in  prostate  cancer  cell  lines.  Both  make  use  of  Promega’s  pGL4  vectors.  The 
first  uses  the  minimum  promoter  present  in  the  pGL4.23,  with  the  only  alteration  being  the 
addition  of  Invitrogen’s  Gateway  cassette  into  the  multiple  cloning  site.  This  allows  for  the  easy 
shuttling  of  multiple  elements  into  the  vector  without  the  need  for  traditional  cloning.  The 
second  vector  began  with  Promega’s  promoterless  pGL4. 10,  into  which  the  MYC  promoter  and 
the  Gateway  cassette  were  both  inserted.  MYC  is  known  to  be  expressed  from  numerous 
promoters,  with  the  majority  of  transcripts  initiating  from  promoter  2  (P2);  as  its  proximal 
regulation  is  still  not  entirely  understood,  we  wished  to  be  overly  conservative  in  the  definition 
of  “promoter”  to  ensure  that  all  necessary  elements  were  present  in  our  reporter  construct  (19). 
To  that  end,  1.7kb  of  sequence  upstream  of  the  MYC  transcriptional  start  site  was  cloned  into  the 
pGL4. 10  vector.  This  element  will  be  tested  prior  to  use  with  putative  enhancer  elements  to 
determine  its  basil  regulatory  potential  in  prostate  cancer  cell  lines. 


KEY  RESEARCH  ACCOMPLISHMENTS: 

•  We  identified  a  prostate  enhancer  located  within  a  prostate  cancer  associated  region  capable 
of  driving  in  vivo  reporter  gene  expression  in  the  developing  and  mature  mouse  prostate. 
Furthermore,  we  showed  that  the  genotype  of  the  cancer  associated  SNP  rs6983267  - 
contained  within  this  enhancer  -  conveys  allele-specific  regulatory  potential  to  the  enhancer 
element,  with  the  risk  variant  possessing  stronger  enhancer  abilities  than  the  protective  allele. 
These  findings  were  published  in  the  high  impact  journal  Genome  Research  (13). 

•  Our  broad-scale  BAC  scan  of  the  8q24  gene  desert  showed  that  there  are  other  regulatory 
sequences  within  the  interval  of  interest;  specifically,  we  uncovered  a  mammary  gland 
enhancer  within  a  region  that  has  been  associated  with  risk  to  breast  cancer  (13)  (Appendix, 
Figure  1  of  paper).  These  results  demonstrate  that  we  have  generated  a  powerful  tool  to 
experimentally  interrogate  genomic  regions  showing  association  to  multiple  types  of  cancer, 
and  that  this  tool  can  be  widely  disseminated  among  the  cancer  genetics  research  community 
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•  Our  results,  combined  with  those  of  other  researchers  working  with  colorectal  cancer  (14-16, 
20,  21),  demonstrate  that  the  same  genetic  variation  -  known  to  increases  risk  to  both 
prostate  and  colorectal  cancer  -  functions  in  both  cases  by  altering  the  spatial,  temporal, 
and/or  quantitative  fine  tuning  of  MFC  expression  through  allele-specific  enhancer  activity. 

•  As  our  in  vivo  enhancer  reporter  assays  allow  for  the  interrogation  of  regulatory  potential 
over  developmental  time,  we  were  able  to  demonstrate  that  the  rs6983267-containing 
enhancer  is  active  throughout  prostate  organogenesis(7  3)  (Appendix,  Figure  2  and  3  of 
paper).  These  results  pose  the  intriguing  possibility  that  the  increased  risk  to  prostate  cancer 
might  result  from  a  misregulation  of  MFC’s  expression  early  in  development,  long  before  the 
onset  of  tumorigenesis. 

•  Based  on  these  accomplishments,  I  have  been  invited  to  write  a  book  chapter  discussing  cis- 
regulatory  mechanisms  underlying  cancer  risk.  This  chapter  will  be  included  in  an  upcoming 
book  edited  by  Nadav  Ahituv,  Ph.D,  and  will  focus  with  prominence  on  prostate  cancer 
genetics. 


REPORTABLE  OUTCOMES: 

Publications: 

Wasserman  NF,  Aneas  I,  Nobrega  MA.  An  8q24  gene  desert  variant  associated  with  prostate 
cancer  risk  confers  differential  in  vivo  activity  to  a  MFC  enhancer.  Genome  Research  20(9), 
1191-1197  (2010). 

Presentations  at  Scientific  Meetings: 

Wasserman  NF,  Nobrega  MA.  An  8q24  gene  desert  variant  associated  with  prostate  cancer  risk 
confers  differential  in  vivo  activity  to  a  MYC  enhancer.  Poster.  IMPaCT,  2011. 


CONCLUSION: 

The  BAC  enhancer  trapping  strategy  that  we  employed  allowed  us  to  rapidly  interrogate  the 
440kb  of  8q24  prostate  cancer-associated  non-coding  DNA  for  c/.s-regulatory  elements.  We 
effectively  screened  a  half-megabase  genomic  interval  in  vivo  using  only  three  constructs,  and 
succeeded  in  identifying  a  prostate  enhancer  within  an  interval  strongly  associated  with  prostate 
cancer.  In  addition,  we  localized  a  specific  prostate  enhancer  contained  within  the  overlapping 
region  of  two  of  our  BACs  and  showed  that  it  possessed  in  vivo  allele-specific  regulatory 
abilities  contingent  on  the  genotype  of  the  prostate  cancer  associated  SNP  rs6983267.  These 
results  -  showing  the  cancer  risk  allele  demonstrating  stronger  enhancer  potential  than  the  non¬ 
risk  allele  -  are  concordant  with  MFC’s  known  role  as  a  proto-oncogene.  Finally,  we 
demonstrated  that  the  rs6983267-containing  enhancer  exhibits  differential  in  vivo  activity 
throughout  prostate  organogenesis.  As  no  association  has  been  seen  between  rs6983267 
genotype  and  steady-state  MFC  mRNA  levels  in  nonnal  prostate  cells  or  prostate  tumors  (22), 
our  results  raise  the  possibility  that  this  variant  asserts  its  influence  on  prostate  cancer  risk  before 
tumorigenesis  actually  occurs.  Our  findings  contribute  to  the  field’s  understanding  of  the 
mechanistic  reason  for  the  overwhelming  association  seen  between  this  8q24  gene  desert  and 
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prostate  cancer.  By  explaining  the  genetic  basis  for  disease  risk,  progress  towards  clinical 
applications  can  be  made. 
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An  8q24  gene  desert  variant  associated  with  prostate 
cancer  risk  confers  differential  in  vivo  activity 
to  a  MYC  enhancer 

Nora  F.  Wasserman,  Ivy  Aneas,  and  Marcelo  A.  Nobrega1 

Department  of  Human  Genetics,  University  of  Chicago,  Chicago,  Illinois  60637,  USA 


Genome-wide  association  studies  (GWAS)  routinely  identify  risk  variants  in  noncoding  DNA,  as  exemplified  by  reports  of 
multiple  single  nucleotide  polymorphisms  (SNPs)  associated  with  prostate  cancer  in  five  independent  regions  in  a  gene 
desert  on  8q24.  Two  of  these  regions  also  have  been  associated  with  breast  and  colorectal  cancer.  These  findings  implicate 
functional  variation  within  long-range  c/s-regulatory  elements  in  disease  etiology.  We  used  an  in  vivo  bacterial  artificial 
chromosome  (BAC)  enhancer-trapping  strategy  in  mice  to  scan  a  half-megabase  of  the  8q24  gene  desert  encompassing  the 
prostate  cancer-associated  regions  for  long-range  c/s-regulatory  elements.  These  BAC  assays  identified  both  prostate  and 
mammary  gland  enhancer  activities  within  the  region.  We  demonstrate  that  the  8q24  cancer-associated  variant  rs6983267 
lies  within  an  in  vivo  prostate  enhancer  whose  expression  mimics  that  of  the  nearby  MYC  proto-oncogene.  Additionally,  we 
show  that  the  cancer  risk  allele  increases  prostate  enhancer  activity  in  vivo  relative  to  the  non-risk  allele.  This  allele-specific 
enhancer  activity  is  detectable  during  early  prostate  development  and  throughout  prostate  maturation,  raising  the  pos¬ 
sibility  that  this  SNP  could  assert  its  influence  on  prostate  cancer  risk  before  tumorigenesis  occurs.  Our  study  represents  an 
efficient  strategy  to  build  experimentally  on  GWAS  findings  with  an  in  vivo  method  for  rapidly  scanning  large  regions  of 
noncoding  DNA  for  functional  c/s-regulatory  sequences  harboring  variation  implicated  in  complex  diseases. 

[Supplemental  material  is  available  online  at  http://www.genome.org.] 


Genome-wide  association  studies  (GWAS)  routinely  implicate 
variation  within  gene  deserts  and  other  types  of  noncoding  DNA 
in  the  etiology  of  disease  (Houlston  et  al.  2008;  Silverberg  et  al. 
2009;  Yang  et  al.  2009;  Liu  et  al.  2010).  A  recent  meta-analysis  of 
—  1200  disease-associated  single  nucleotide  polymorphisms  (SNPs) 
found  that  in  40%  of  cases,  known  exonic  sequences  were  absent 
from  the  associated  linkage  disequilibrium  (LD)  blocks  (Visel  et  al. 
2009).  While  the  presence  of  nonannotated  transcripts  or  non¬ 
coding  RNAs  may  explain  some  of  the  noncoding  disease  associ¬ 
ations,  these  observations  also  have  been  interpreted  as  evidence 
that  many  of  the  associated  noncoding  regions  harbor  variants  that 
alter  the  activity  of  long-range  c/s-regulatory  elements  controlling 
gene  expression.  Enhancers  are  one  such  type  of  long-range  ele¬ 
ment,  functioning  over  up  to  megabase-long  genomic  distances  to 
regulate  the  temporal  and  tissue-specific  expression  patterns  of 
their  target  gene(s)  (Nobrega  et  al.  2003).  A  large  number  of  genes 
with  tissue-  and  temporal-specific  expression  patterns  are  known  to 
be  controlled  by  an  array  of  enhancers,  with  each  individual  c/s- 
regulatory  element  driving  a  subset  of  its  gene's  entire  expression 
profile  (Carroll  2008).  This  modular  nature  of  enhancer  activity 
makes  them  ideal  candidates  for  involvement  in  complex  diseases, 
as  functional  variants  in  an  individual  c/s-element  would  result  in 
changes  to  gene  expression  only  in  specific  organs/tissue  types. 

Despite  the  plethora  of  GWAS  signals  implicating  noncoding 
regions  in  complex  disease  risk,  strategies  to  experimentally  follow 
up  on  such  findings  are  lacking.  This  deficiency  stems  principally 
from  the  difficulty  in  identifying  functional  noncoding  sequences 
that  map  remotely  from  their  target  genes.  Programs  such  as 
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ENCODE  have  been  addressing  this  deficiency  by  developing  and 
applying  technologies  to  identify  these  elusive  types  of  long-range 
regulatory  elements  (The  ENCODE  Project  Consortium  2007). 
While  these  technologies  have  been  invaluable  in  the  identifica¬ 
tion  of  putative  functional  noncoding  sequences,  they  rely  heavily 
on  cell  culture  and  other  in  vitro  and  in  silico  methodology  to 
identify  and  experimentally  validate  enhancers  and  other  ele¬ 
ments.  Thus,  although  these  techniques  are  ideal  for  functionally 
following  up  on  noncoding  GWAS  results  when  the  relevant  cell 
type  of  interest  is  obvious  and  accessible,  problems  can  arise  if  the 
putative  element  under  investigation  imparts  its  transcriptional 
regulatory  effects  in  a  cell  type  of  unpredicted  origin  or  one  that  is 
not  amenable  to  routine  culture.  Necessary,  but  lagging,  is  the  de¬ 
velopment  of  simpler  in  vivo  strategies  that  can  concurrently  query 
the  spatial  and  temporal  properties  of  functional  c/s-regulatory  se¬ 
quences  within  large  segments  of  noncoding  DNA.  Our  goal  in  this 
study  is  to  describe  one  such  strategy  for  following  up  on  GWAS 
results,  and  to  test  its  ability  to  uncover  noncoding  risk  variants  in 
loci  associated  with  complex  diseases. 

A  striking  example  of  GWAS  implicating  noncoding  variants 
in  the  etiology  of  complex  diseases  can  be  seen  on  chromosome 
8q24,  where  numerous  studies  have  reported  associations  between 
multiple  types  of  cancer — including  prostate,  colorectal,  breast, 
and  urinary  bladder — and  variants  concentrated  within  620  kb  of 
a  1.2-Mb  gene  desert  (Amundadottir  et  al.  2006;  Easton  et  al.  2007; 
Gudmundsson  et  al.  2007;  Haiman  et  al.  2007;  Tomlinson  et  al. 
2007;  Zanke  et  al.  2007;  Ghoussaini  et  al.  2008;  Kiemeney  et  al. 
2008;  Al  Olama  et  al.  2009).  Evidence  for  prostate  cancer  associa¬ 
tion  within  the  region  is  particularly  strong,  with  five  distinct  LD 
blocks  spanning  a  440-kb  interval  on  8q24  harboring  risk  variants 
(Fig.  1A,  all  shaded  regions;  Ghoussaini  et  al.  2008;  Al  Olama  et  al. 
2009).  One  of  these  prostate  cancer-associated  variants,  rs6983267, 
is  independently  associated  with  colorectal  cancer  (Fig.  1A,  green; 
Tomlinson  et  al.  2007),  and  a  second  prostate  cancer-associated  LD 
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Figure  1.  The  8q24  MYC  gene  desert  harbors  prostate  and  mammary  gland  transcriptional  enhancers.  (A)  Five  susceptibility  loci  within  the  440-kb  interval 
shown  to  be  associated  with  prostate  cancer  (all  shaded  regions;  blue  denotes  a  prostate-only  association),  with  one  locus  independently  associated  with 
breast  cancer  (pink)  and  a  second  associated  with  colorectal  cancer  (green).  (B)  Breast  cancer-associated  region,  (CR)  colorectal  cancer-associated  region, 
(P)  prostate  cancer-associated  region.  (Blue  circle)  MYC,  (red  asterisk)  SNP  rs6983267.  (Below)  The  three  human  tacZ-tagged  BACs  encompassing  the 
prostate  cancer  risk  regions.  (Red  dotted  lines)  The  LD  block  containing  SNP  rs6983267 — associated  with  both  prostate  and  colorectal  cancers  and  contained 
within  BACs  RP1 1-1 24F1 5  and  CTD-2533C10 — is  shown  in  detail.  Sequence  conservation  is  shown  in  chicken  and  mouse  genomes  (human  genome  used  as 
reference).  (8)  The  male  genitourinary  apparatus  in  P8  mice,  shown  as  a  cartoon  (left)  and  in  wild-type,  nontransgenic  mice  (right).  (Dashed  line,  right) 
Outline  of  the  prostate.  (B)  Bladder,  (CG)  coagulating  gland,  (DD)  ductus  deferens,  (P)  prostate,  (SV)  seminal  vesicle,  (U)  urethra.  There  is  endogenous  X-gal 
staining  in  the  SV  and  DD.  (C)  Representative  P8  prostates  from  transgenic  mice  containing  BAC  RP1 1-124F15  or  CTD-2533C10  showing  prostate  and 
urogenital  apparatus  enhancer  activity.  (Dashed  lines)  Outlines  of  prostates.  (D)  The  mammary  gland  in  midgestational  pregnant  females,  shown  as 
a  cartoon  (left)  and  in  wild-type,  nontransgenic  mice  (right).  The  enlargement  (left)  illustrates  a  lymph  node,  ducts,  and  alveoli  and  in  a  mammary  fat  pad. 
(LN)  Lymph  node,  (MG)  mammary  gland.  (E)  Representative  mammary  fat  pad  from  a  day  1 4.5  pregnant  female  harboring  BAC  RP1 1  -1 24F1 5. 


block  harbors  a  distinct  SNP  (rsl3281615)  that  shows  association 
with  breast  cancer  (Fig.  1A,  pink;  Easton  et  al.  2007).  Although  no 
well-annotated  genes  lie  within  this  interval,  the  independent 
associated  variants  (or  linked  functional  elements  within  the  as¬ 
sociated  regions)  may  all  be  regulating  the  expression  patterns  of 
a  single  gene  involved  in  cancer  tumorigenesis  and/or  progression 
in  various  tissue  types.  The  proto-oncogene  MFC  lies  immediately 
downstream  of  this  gene  desert,  raising  the  possibility  that  the 
associated  regions  of  risk  may  harbor  long-range  ris-regulatory  el¬ 
ements  involved  in  the  tissue-specific  transcriptional  regulation  of 
MYC  expression;  under  this  hypothesis,  each  distinct  association 
interval  might  harbor  a  functional  noncoding  element  involved  in 
regulating  MYC  expression  in  the  corresponding  tissue  type  for 
each  implicated  cancer.  A  summary  of  the  8q24  gene  desert  and  its 
numerous  cancer  loci  is  shown  in  Figure  1.  Here,  we  have  chosen 
to  specifically  focus  on  the  multiple  independent  associations 
between  this  8q24  gene  desert  and  prostate  cancer. 

Encoding  a  well-known  transcription  factor  essential  to  the 
regulation  of  cell  proliferation  and  growth,  MYC  is  up-regulated  at 
both  the  mRNA  and  protein  levels  in  aggressive  prostate  cancers 
(DeMarzo  et  al.  2003).  In  addition,  copy-number  analyses  in  pros¬ 
tate  cancer  specimens  have  identified  the  8q24  region  surrounding 
MYC  as  the  most  common  recurrent  region  of  chromosomal  gain 


(Lapointe  et  al.  2007).  These  findings  show  that  prostate  cancers 
employ  multiple  mechanisms  for  achieving  MYC  overexpression, 
through  transcriptional  up-regulation  or  through  amplification  of 
gene  copy  number.  We  hypothesized  that  variation  within  MFC's 
long-range  ris-regulatory  elements  could  disrupt  the  quantitative, 
temporal,  or  spatial  expression  patterns  of  MFC  in  the  prostate, 
possibly  underlying  the  GWAS  signals  identified  in  the  8q24  gene 
desert.  In  this  study,  we  describe  how  an  in  vivo  bacterial  artificial 
chromosome  (BAC)  enhancer-trapping  strategy  efficiently  scan¬ 
ned  the  8q24  gene  desert  for  ris-regulatory  sequences,  and  report 
on  the  identification  of  both  prostate  and  mammary  gland  en¬ 
hancer  activities  within  the  assayed  regions.  We  further  refined  the 
prostate  enhancer  interval,  showing  that  it  harbors  the  prostate 
cancer  risk  SNP  rs6983267,  and  demonstrate  that  the  two  resul¬ 
tant  allelic  variants  display  functionally  polymorphic  prostate 
enhancer  properties  in  vivo. 

Results 

Surveying  the  regulatory  landscape  of  the  8q24  gene  desert 

To  initially  examine  the  8q24  gene  desert  for  regulatory  elements, 
we  surveyed  the  region  using  a  broad-scale  BAC  scan  approach 
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(Spitz  et  al.  2003).  This  strategy  allows  for  the  rapid  and  effective 
examination  of  large  genomic  regions  for  ds-regulatory  elements, 
and  can  be  readily  applied  to  any  locus  of  interest.  We  identified 
three  overlapping  human  BACs  encompassing  the  prostate  cancer 
risk  regions  (Fig.  1A),  which  together  span  480  kb  of  noncoding 
DNA.  Each  BAC  carried  the  prostate  cancer-associated  risk  haplo- 
type  and  was  tagged  through  a  Tn7  transposon-mediated  random 
insertion  of  a  beta-galactosidase  ( lacZ )  gene  driven  by  a  beta-globin 
minimal  promoter  (Spitz  et  al.  2003).  The  transposon-mediated 
insertion  was  performed  using  simple,  commercially  available  kits 
(see  Methods)  and  occurs  in  vitro;  the  protocol  yields  rapid  results 
and  can  be  easily  scaled  up  for  the  simultaneous  tagging  of  nu¬ 
merous  BACs. 

The  lacZ  cassette  integration  converts  the  BACs  into  en¬ 
hancer-trapping  systems,  whereby  any  long-range  enhancer(s) 
contained  within  each  ~180-kb  BAC  can  act  upon  the  reporter 
gene  to  drive  tissue-  and  temporal-specific  beta-galactosidase  ex¬ 
pression.  Any  enhancers  present  within  a  given  BAC  are  then  si¬ 
multaneously  interrogated  using  a  reporter  assay  system,  allowing 
for  the  concurrent  examination  of  large  genomic  regions  for 
functional  noncoding  elements.  The  design  of  overlapping  BACs 
aids  in  the  efficiency  of  the  system  to  narrow  the  critical  region  of 
interest,  as  expression  profiles  unique  to  only  one  BAC  must  be  due 
to  uniquely  contained  sequences;  conversely,  identical  expression 
patterns  present  in  overlapping  BACs  suggest  that  the  functional 
element  driving  beta-galactosidase  expression  must  be  contained 
in  the  shared  genomic  region.  Modified  BACs  were  analyzed  by 
PCR  and  pulsed-field  gel  electrophoresis  to  confirm  the  integration 
of  the  Tn 7$-lacZ  reporter  cassette.  To  mitigate  any  possible  effects 
of  unknown  insulator  or  silencer  elements  within  the  BAC  se¬ 
quence,  we  selected  clones  with  at  least  two  Tn  7fi-lacZ  integration 
events.  Each  BAC  was  then  injected  into  fertilized  mouse  oocytes 
to  generate  transgenic  mice  in  accordance  with  IACUC  regulatory 
standards.  For  each  BAC,  a  minimum  of  two  independent  trans¬ 
genic  founders  were  obtained  and  studied;  this  is  necessary  to 
overcome  potential  position-dependent  expression  effects  result¬ 
ing  from  random  integration  of  the  transgene  (BAC). 

We  assayed  lacZ  expression  at  multiple  points  in  prostate  or¬ 
ganogenesis  and  maturation;  postnatal  days  0  and  8  (P0  and  P8) 
during  prostate  development,  and  P21,  when  prostate  maturation 
is  virtually  complete  (Sugimura  et  al.  1986).  At  each  developmental 
stage,  prostates  were  dissected  and  stained  for  beta-galactosidase 
expression  using  X-gal  (Fig.  1B,C;  Kothary  et  al.  1989). 

These  in  vivo  BAC  transgenic  reporter  assays  identified  pros¬ 
tate  enhancer  activity  contained  within  the  8q24  gene  desert  (Fig. 
1C).  While  we  did  not  observe  beta-galactosidase  prostate  expres¬ 
sion  in  BAC  CTD-2506D10  transgenic  mice  (12  independent 
transgenics),  animals  harboring  BACs  CTD-2533C10  and  RP11- 
124F15  displayed  beta-galactosidase  prostate  expression  at  days  P0 
(data  not  shown),  P8  (Fig.  1C),  and  P21  (data  not  shown).  As  il¬ 
lustrated  in  Figure  1C,  the  beta-galactosidase  expression  domain  of 
both  BAC  RP11-124F15  and  BAC  CTD-2533C10  extends  to  other 
components  of  the  urogenital  system,  including  the  coagulating 
glands,  urethra,  and  the  lining  of  the  urinary  bladder.  While  the 
seminal  vesicles  and  ductus  deferens  also  exhibit  X-gal  staining, 
we  and  others  observed  this  expression  pattern  in  both  wild-type 
(Fig.  IB)  and  transgenic  animals,  reflecting  the  presence  of  en¬ 
dogenous  beta-galactosidase  in  these  structures  (Wang  et  al.  2002; 
Krajnc-Franken  et  al.  2004).  As  80%  of  the  prostatic  ducts  are 
formed  by  day  P15  in  mice  (Sugimura  et  al.  1986),  our  data  indicate 
that  the  enhancer(s)  contained  within  these  two  BACs  are  active 
both  during  and  after  prostate  organogenesis  and  maturation. 


Because  some  of  the  prostate  cancer-associated  regions  also 
have  been  associated  with  breast  and  colorectal  cancer  (Fig.  1  A),  we 
chose  to  additionally  assay  the  mammary  glands,  colon,  and  rec¬ 
tum  of  those  animals  transgenic  for  BACs  containing  the  relevant 
regions  (BAC  RP11-124F15  for  breast  cancer,  and  both  BACs  RP11- 
124F15  and  CTD-2533C10  for  colorectal  cancer).  Mammary 
glands  were  examined  at  embryonic  day  14.5  (E14.5),  when  the 
mammary  buds  have  fully  formed  in  female  embryos,  in  1 1-wk-old 
virgin  females  with  mature  branched  glands,  and  in  prelactating 
females  14  d  after  conception,  when  the  mammary  gland  un¬ 
dergoes  extensive  hyperplasia  and  tissue  remodeling  (Hens  and 
Wysolmerski  2005;  Oakes  et  al.  2006;  Sternlicht  2006). 

We  observed  in  vivo  mammary  gland  enhancer  activity  in 
mice  transgenic  for  BAC  RP11-124F15  (Fig.  IE),  which  harbors 
associated  intervals  for  not  only  prostate  but  also  breast  and  co¬ 
lorectal  cancer.  Transgenic  animals  displayed  beta-galactosidase 
expression  in  the  epithelial  compartment — ducts  and  alveoli 
(Hennighausen  and  Robinson  2005) — of  the  mammary  glands  of 
midgestational  pregnant  and  1 1-wk-old  virgin  females  (Fig.  IE; 
data  not  shown).  No  enhancer  activity  was  seen  in  E14.5  embryos. 
Of  note,  Jia  et  al.  (2009)  recently  identified  a  noncoding  element 
within  this  region  capable  of  in  vitro  enhancer  activity  in  breast 
cancer  cell  lines;  this  element  should  be  viewed  as  a  strong  can¬ 
didate  for  the  mammary  gland  activity  we  see  in  vivo. 

Characterizing  the  prostate  enhancer 

We  next  aimed  to  refine  the  location  of  the  prostate  enhancer(s) 
within  the  BACs  driving  prostate  expression.  Because  of  the  highly 
similar  reporter  expression  patterns  obtained  from  BACs  RP11- 
124F15  and  CTD-2533C10,  including  prostate,  coagulating  gland, 
and  urethral/bladder  lining,  we  hypothesized  that  our  BAC  trans¬ 
genic  assays  were  identifying  a  single  prostate  enhancer  within  the 
59-kb  shared  genomic  segment  of  these  two  BACs.  Interestingly, 
one  of  the  most  strongly  associated  prostate  cancer  risk  SNPs, 
rs6983267,  is  contained  within  this  59-kb  overlapping  interval  and 
disrupts  an  evolutionarily  conserved  sequence  (Fig.  1A). 

To  directly  test  the  rs6983267-containing  evolutionarily 
conserved  element  for  regulatory  potential  in  vivo,  we  cloned  a 
5 -kb  DNA  fragment  containing  each  allele  of  this  SNP  in  a  lacZ 
reporter  cassette  using  Invitrogen's  Gateway  cloning  system 
(Kothary  et  al.  1989).  Transgenic  mice  harboring  either  the  risk  or 
the  non-risk  variant  of  rs6983267  were  generated  and  analyzed. 
We  determined  that  the  conserved  sequence  containing  the 
prostate  cancer  GWAS  SNP  displayed  allele-specific  in  vivo  prostate 
enhancer  properties  (Fig.  2).  Specifically,  the  risk  allele,  rs6983267-G, 
led  to  consistent,  stronger  beta-galactosidase  expression  in  pros¬ 
tates  and  coagulating  glands  than  the  non-risk  allele,  rs6983267-T, 
in  P0,  P8,  and  P21  transgenic  mice  (Figs.  2A,B,  3B,C).  The  expres¬ 
sion  pattern  driven  by  the  rs6983267-G  risk  allele  in  three  in¬ 
dependent  mouse  transgenic  lines  closely  resembled  that  observed 
in  BACs  RP11-124F15  and  CTD-2533C10 — both  of  which  also 
harbor  the  risk  allele.  In  contrast,  the  rs6983267-T  non-risk  allele 
led  to  weakened  prostate  and  coagulating  gland  expression  in 
three  independent  transgenic  lines  (Fig.  2B).  For  each  allelic  vari¬ 
ant  evaluated,  those  transgenic  founders  exhibiting  enhancer  ac¬ 
tivity  showed  highly  concordant  beta-galactosidase  expression  in 
the  prostate,  with  a  clear  qualitative  difference  between  the  risk 
and  non-risk  variants. 

To  test  whether  this  spatial  reporter  expression  pattern  of  the 
rs6983267-containing  enhancer  correlates  with  endogenous  MYC 
expression  in  prostate  and  other  components  of  the  urogenital 
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Figure  2.  SNP  rs6983267  mediates  allelic-specific  enhancer  activity  in  mouse  prostates.  Three  independent  transgenic  founders  harboring  reporter 
plasmids  driven  by  either  the  G  (risk)  allele  (4)  orT  (non-risk)  allele  (8)  are  shown  at  P8.  (Dashed  lines)  Outlines  of  prostates;  (CG)  coagulating  glands.  The 
prostate  cancer  risk  allele  leads  to  consistently  stronger  beta-galactosidase  expression  in  prostates  and  coagulating  glands  than  the  non-risk  allele  in  vivo. 
(C)  MYC  in  situ  hybridization  at  P8  correlates  with  the  reporter  expression  pattern  driven  by  the  rs6983267-containing  enhancer. 


system,  we  performed  whole  mount  in  situ  hybridizations  using 
a  full-length  Myc  probe  in  mouse  prostates  at  P8  (Wilkinson  and 
Nieto  1993).  We  observed  Myc  expression  in  the  male  genitouri¬ 
nary  apparatus,  including  the  prostate,  in  a  pattern  closely  mim¬ 
icking  the  reporter  expression  of  the  rs6983267-G  enhancer  and 
BACs  CTD-2533C10  and  RP11-124F15,  both  of  which  harbor  the 
G  risk  allele  as  well  (Fig.  2C). 

This  same  prostate  enhancer  that  we  have  characterized  also 
has  been  shown  to  act  as  an  allelic-specific  long-range  MYC  en¬ 
hancer  in  colorectal  cancer  cells  (Jia  et  al.  2009;  Pomerantz  et  al. 
2009a;  Tuupanen  et  al.  2009;  Wright  et  al.  2010).  Although  we  did 
not  observe  colorectal  enhancer  activity  in  our  initial  BAC  screen 
of  the  region,  we  again  assayed  transgenic  animals  harboring  ei¬ 
ther  the  risk  or  non-risk  rs6983267-containing  enhancer  element 
for  in  vivo  enhancer  activity  in  the  colorectal  area  at  three  de¬ 
velopmental  time  points.  We  observed  no  beta-galactosidase 
expression  in  E14.5  intestines  for  either  construct  tested,  and  co¬ 
lorectal  X-gal  staining  at  P8  and  P21  was  indistinguishable  be¬ 
tween  wild-type  mice  and  transgenic  animals  harboring  either 
enhancer  variant  (Supplemental  material).  Strong  endogenous 
beta-galactosidase  expression  is  observed 
in  intestines  of  both  wild-type  and  trans¬ 
genic  animals  starting  at  E15.5,  limiting 
our  ability  to  identify  in  vivo  colorectal 
enhancers  in  late  embryogenesis  and 
postnatally.  These  findings  highlight  the 
difficulty  in  assaying  postnatal  in  vivo 
intestinal  enhancers  using  lacZ  reporter 
assays. 

Investigations  into  the  embryonic 
activity  of  the  rs6983267-containing 
element  demonstrated  that  while  this 
enhancer  has  several  spatial  domains  of 
expression,  its  allele-specific  activity  is 
restricted  to  the  prostate  and  coagulat¬ 
ing  glands.  Both  the  rs6983267-G  and 
rs6983267-T  enhancer  elements  drove 
expression  in  several  spatial  domains  of 
El  1.5  and  E14.5  embryos,  with  no  ap¬ 
parent  allelic-specific  enhancer  activity 
(Fig.  3A).  Transgenics  harboring  either 
haplotype  variant  showed  similar  X-gal 


staining  in  the  limbs  and  tail  at  El  1.5,  consistent  with  previously 
reported  patterns  (data  not  shown;  Tuupanen  et  al.  2009).  We  also 
observed  enhancer  activity  in  the  developing  urinary  bladder, 
genital  tubercle,  and  limbs  in  the  E14.5  embryos.  This  pattern, 
which  precedes  prostate  development,  is  also  indistinguishable 
between  the  allelic  variants  of  this  enhancer  (Fig.  3A). 

Taken  together,  our  data  posit  that  the  rs6983267-containing 
enhancer  is  part  of  MFC’s  regulatory  landscape,  and  that  the  var¬ 
iant  within  this  enhancer  may  increase  the  risk  of  prostate  cancer 
through  its  role  in  allelic-specific  control  of  MFC  expression  in  the 
prostate. 

Discussion 

The  BAC  enhancer-trapping  strategy  that  we  employed  allowed  us 
to  rapidly  interrogate  the  440  kb  of  8q24  prostate  cancer-associated 
noncoding  DNA  for  ds-regulatory  elements.  We  effectively 
screened  a  half-megabase  genomic  interval  in  vivo  using  only 
three  constructs,  identifying  the  existence  of  mammary  gland  and 
prostate  enhancers  in  the  interval  associated  with  each  respective 


Figure  3.  The  rs6983267-containing  enhancer  demonstrates  distinct  temporal  regulatory  abilities. 
Representative  G  (risk,  top)  and  T  (non-risk,  bottom)  transgenics  are  shown  at  a  series  of  developmental 
time  points.  (4)  El  4.5  transgenic  embryos  exhibit  beta-galactosidase  expression  in  the  genital  tubercle 
and  limbs,  with  no  apparent  allele-specific  enhancer  activity.  (GT)  Genital  tubercle.  (B,C)  Allele-specific 
regulatory  ability  is  visible  in  neonatal  P0  pups  (8)  and  P21  adolescent  mice  (C),  with  in  vivo  prostate  and 
coagulating  gland  beta-galactosidase  expression  qualitatively  stronger  in  the  risk  allele  (top)  line  than 
the  non-risk  variant  (bottom).  (CG)  Coagulating  gland,  (P)  prostate. 
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cancer  type.  We  believe  that  this  methodology  provides  a  signifi¬ 
cant  advance  to  current  genomic  techniques  for  following  up  on 
GWAS  results  in  noncoding  regions,  as  it  can  be  easily  adapted  to 
examine  loci  in  vivo  on  a  megabase  scale.  As  demonstrated  by  our 
results,  this  strategy  can  be  used  to  concurrently  identify  spatially 
and  temporally  unique  enhancers  within  a  large  sequence,  and  can 
be  useful  in  refining  the  critical  regions  for  enhancer  mapping, 
while  still  permitting  the  use  of  a  whole-systems,  in  vivo  animal 
model. 

These  relatively  straightforward  BAC  transgenic  reporter  as¬ 
says  also  provide  a  way  to  more  closely  approximate  the  genomic 
context  of  relevant  enhancers.  By  testing  —200  kb  of  sequence 
simultaneously,  enhancers  are  assayed  in  a  context  much  closer  to 
their  true  genomic  environment,  one  where  they  are  subjected  to 
(largely  unknown)  modifications  by  neighboring  repressors,  in¬ 
sulators,  chromatin  changes,  and/or  various  other  interactions 
with  nearby  cis  sequences.  In  traditional  plasmid-based  reporter 
assays,  this  important  genomic  context  is  lost.  We  conducted  our 
clone  selection  strategy  so  as  to  minimize  the  potential  negative 
effects  of  such  insulators  or  repressors;  tagged  BACs  containing  at 
least  two  copies  of  the  Tn7pl-/acZ  reporter  cassette — integrated  near 
each  end  of  the  BAC  sequence — were  selected  for  experimental 
use.  We  hypothesized  that  this  would  diminish  false-negative  re¬ 
sults  caused  by  repressive  elements  in  a  single-copy  integration 
clone.  When  compared  with  BACs  tagged  with  just  a  single  Tn7|3- 
lacZ  cassette,  we  observed  more  reproducible  results  in  mice 
transgenic  for  BACs  harboring  two  Tn7p-/acZ  integrations  (M.A.N, 
unpubl.). 

Because  we  observed  the  same  urogenital  system  spatial  pat¬ 
tern  of  expression  in  both  of  the  overlapping  BACs  tested,  we  de¬ 
duced  that  the  enhancer  was  within  the  small  interval  shared  be¬ 
tween  those  BACs.  However,  it  is  possible  that  other  prostate 
enhancers  also  exist  within  the  BACs  we  tested.  To  formally  ex¬ 
clude  this  possibility,  other  approaches  could  have  been  used,  in¬ 
cluding  the  analysis  of  additional  enhancer-trapping  BACs  with 
complementary  overlapping  patterns.  Alternatively,  BAC  recom- 
bineering  could  have  been  employed  to  specifically  delete  our 
known  enhancer  from  the  BACs  assayed.  Both  approaches  are 
logical  follow-ups  to  the  in  vivo  BAC  transgenic  reporter  assays, 
and  would  maintain  the  analytical  strengths  of  assaying  enhancers 
in  their  genomic  environments. 

Recent  studies  have  reported  on  the  colorectal  and  prostate 
enhancer  activities  of  the  rs6983267-containing  sequence  we  de¬ 
scribe  here  (Jia  et  al.  2009;  Pomerantz  et  al.  2009a;  Tuupanen  et  al. 
2009;  Sotelo  et  al.  2010;  Wright  et  al.  2010).  Using  a  combination 
of  genome- wide  in  vitro  assays,  this  sequence  has  been  highlighted 
as  possessing  attributes  of  an  enhancer,  including  specific  chro¬ 
matin  modifications  and  binding  of  transcription  factors.  Several 
groups  have  demonstrated  that  in  colorectal  cancer  cell  lines, 
TCF712  (TCF4)  binds  preferentially  to  the  risk  allele  (rs6983267-G) 
of  this  enhancer  (Pomerantz  et  al.  2009a;  Tuupanen  et  al.  2009; 
Wright  et  al.  2010).  Reports  regarding  the  enhancer  properties  of 
this  sequence  in  prostate  cancer  cell  lines  have  been  mixed,  how¬ 
ever.  When  tested  in  LNCaP  and  PC3  prostate  cancer  cell  lines,  this 
sequence  displayed  enhancer  properties  only  in  the  former,  pos¬ 
sibly  due  to  the  PC-3  line's  lack  of  androgen  receptor  expression 
(Jia  et  al.  2009).  In  a  second  study,  this  rs6983267-containing  en¬ 
hancer  was  unable  to  drive  luciferase  expression  above  promoter- 
only  levels  in  LNCaP  or  PC-3  cells,  unless  cells  were  cotransfected 
with  Tcf4  and  beta-catenin  expression  vectors  (Sotelo  et  al.  2010). 
Under  those  conditions,  the  rs6983267-containing  element  dem¬ 
onstrated  allelic-specific  enhancer  activity  in  LNCaP  cells,  but  with 


the  non-risk  rs6983267-T  variant  driving  stronger  expression  than 
the  risk  rs6983267-G  allele. 

Our  in  vivo  results — showing  the  cancer  risk  allele  demon¬ 
strating  stronger  enhancer  potential  than  the  non-risk  allele — 
corroborate  those  reported  in  colorectal  cancer  cell  lines  (Pomerantz 
et  al.  2009a;  Tuupanen  et  al.  2009;  Wright  et  al.  2010),  and  are  con¬ 
cordant  with  MFC's  known  role  as  a  proto-oncogene.  Our  whole- 
animal  experimental  strategy  obviated  the  experimental  variation 
added  by  cell  lines  to  clearly  show  that  this  element  is  a  functional 
prostate  enhancer  in  vivo,  while  also  adding  the  ability  to  in¬ 
vestigate  enhancer  activity  throughout  organogenesis.  We  believe 
that  this  broad  spatial  and  temporal  characterization  of  regulatory 
potential  is  ideally  afforded  by  in  vivo  experimentation,  and  pro¬ 
pose  this  as  the  standard  in  the  follow-up  to  GWAS  risk  variants 
implicated  in  human  disease. 

The  rs6983267-containing  element  physically  interacts  with 
MFC's  promoter  in  both  colorectal  cancer  and  prostate  cancer  cell 
lines,  providing  evidence  that  this  enhancer  is  involved  in  regulating 
MFC  expression  in  these  two  tissue  types  (Pomerantz  et  al.  2009a; 
Sotelo  et  al.  2010;  Wright  et  al.  2010).  Despite  these  compelling 
findings  and  the  fact  that  altered  MFC  expression  has  been  impli¬ 
cated  repeatedly  in  the  pathogenesis  of  prostate  cancers  (Williams 
et  al.  2005),  no  association  has  been  seen  between  rs6983267  ge¬ 
notype  and  MFC  mRNA  levels  in  normal  prostate  cells  or  prostate 
tumors  (Pomerantz  et  al.  2009b).  This  lack  of  genotype-phenotype 
correlation  implies  that  steady-state  MFC  mRNA  levels  in  adult 
prostate  tissue  may  not  be  the  correct  biological  entity  underlying 
risk.  Our  findings  demonstrate  that  the  rs6983267-containing 
enhancer  exhibits  differential  in  vivo  activity  throughout  prostate 
organogenesis,  and  raise  the  possibility  that  this  variant  asserts 
its  influence  on  prostate  cancer  risk  long  before  tumorigenesis 
occurs.  With  widely  varying  risk  allele  frequencies  in  different 
populations — from  49%  in  American  Caucasians  to  81%  in  African 
Americans  (HapMap,  merged  Phase  1,  2,  and  3  frequencies) — this 
SNP  may  also  have  an  effect  on  the  population  prevalence  of  both 
prostate  cancer  and  colorectal  cancer  (Jemal  et  al.  2009). 

We  have  described  how  a  noncoding  SNP  strongly  associated 
with  disease  can  in  fact  alter  the  in  vivo  activity  of  its  encom¬ 
passing  ds-regulatory  element,  suggesting  a  possible  impact  on 
cancer  risk  before  tumorigenesis  actually  occurs.  Although  further 
studies  are  warranted,  our  in  vivo  temporal  data  hint  at  an  un¬ 
derlying  molecular  explanation  for  this  nongenic  SNP's  contribu¬ 
tion  to  prostate  cancer  risk.  These  findings  emphasize  the  notion 
that  thorough  investigations  into  the  regulatory  impact  of  poly¬ 
morphisms  are  an  indispensable  component  to  the  functional 
follow-up  of  GWAS  scans,  and  stress  the  importance  of  conducting 
these  experiments  using  in  vivo  systems. 


Methods 

Transposon-mediated  BAC  modification 

BACs  CTD-2506D10,  RP11-124F15,  and  CTD-2533C10  were 
modified  by  in  vitro  random  transposition  of  Tn7 p-JncZ  (Spitz  et  al. 
2003).  BAC  DNA  was  extracted  by  using  the  Nucleobond  AX  Kit 
(Macherey-Nagel).  Twenty  nanograms  of  Tn7|3-/acZ  vector  was 
mixed  with  20-40  ng  of  BAC  DNA,  GPS  buffer,  and  TnsABC  trans- 
posase  (New  England  BioLabs),  followed  by  incubation  for  10  min  at 
37°C.  Start  solution  was  added  and  the  reaction  was  extended  for 
1  h.  After  heat  inactivation  for  10  min  at  75°C  and  a  1-h  dialysis, 
electrocompetent  DH10B  cells  were  transformed  with  2  jjiL  of  the 
transposition  reaction.  Cells  were  plated  on  LB  agar  containing 
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20  p-g/mL  kanamycin  and  20  pg/mL  chloramphenicol.  Positive 
colonies  were  first  identified  by  polymerase  chain  reaction  (PCR) 
using  beta-globin  and  lacZ  primers  (Tn7|3-/flcZ  beta-globin  F:  AGCA 
TCTATTGCTTACATTTGC;  Tn 7$-lacZ  lacZ  R:  ATAGGTTACGTTGG 
TGTAGATGG).  Modified  BAC  clones  were  then  digested  with  Notl 
and  separated  by  pulsed-field  gel  electrophoresis  overnight  on  a  1% 
agarose  gel  to  determine  the  number  of  copies  and  the  position(s) 
of  the  integrated  Tn 7^,-lacZ  cassette.  Clones  with  two  copies  of  the 
cassette  were  chosen  for  further  analysis  to  minimize  the  possible 
influence  of  silencer  or  insulator  elements  with  the  BACs. 

lacZ  plasmid  generation 

The  5  kb  of  sequence  surrounding  the  rs6983267-containing 
conserved  element  was  PCR  amplified  from  human  genomic  DNA 
heterozygous  for  the  rs6983267  SNP  (rs6983267  F:  TCTTGACCTG 
ATTGCTGAAAAAT;  rs6983267  R:  TCTGGGGGTGAGTTAAATGA 
TAA).  The  fragment  was  then  purified  using  the  QIAquick  PCR 
Purification  Kit  (Qiagen)  and  cloned  into  the  pDONR  221  Gateway 
entry  vector  (Invitrogen).  Colonies  were  analyzed  by  restriction 
enzyme  analysis  for  successful  fragment  insertion,  and  positive 
clones  were  sequenced  to  determine  the  allelic  status  of  SNP 
rs6983269  (rs6983267-seq  F:  TAGACACCAAGAGGGAGGTATCA; 
rs6983267-seq  R:  CCAGGTTAAAGGAAACTGAACTG).  Clones  con¬ 
taining  sequence  harboring  both  the  risk  (G)  and  non-risk  (T) 
rs6983267  allele  were  transferred  to  a  Gateway-HSP68-/ncZ  reporter 
vector  using  the  LR  recombination  reaction  (Invitrogen)  (Poulin 
et  al.  2005).  All  plasmids  were  again  verified  by  restriction  analysis 
and  direct  sequencing  prior  to  pronuclear  mouse  injections. 

Production  of  transgenic  mice 

Tn7|3-ZflcZ  tagged  BAC  DNA  was  purified  using  the  Nucleobond 
BAC  100  Kit  (Macherey-Nagel),  rehydrated  in  injection  buffer  (10 
mM  Tris  at  pH  7.5;  0.1  mM  EDTA),  and  diluted  to  a  concentration 
of  2  ng/pL.  BAC  DNA  was  injected  in  its  circular  form. 

Plasmid  DNA  was  purified  using  the  Plasmid  Maxi  Kit  (Qia¬ 
gen),  and  50  pg  of  each  plasmid  was  digested  with  Sail  to  excise 
the  vector  backbone.  Following  a  gel  purification  step  using  the 
QIAquick  Gel  Extraction  Kit  (Qiagen),  the  DNA  to  be  injected  was 
further  purified  using  a  standard  ethanol  precipitation.  The  puri¬ 
fied  DNA  was  dialyzed  for  24  h  against  injection  buffer  ( 10  mM  Tris 
at  pH  7.5;  0.1  mM  EDTA),  and  its  concentration  was  determined 
fluorometrically  and  by  agarose  gel  electrophoresis.  The  DNA  was 
diluted  to  a  concentration  of  2  ng/pL.  Purified  BAC  and  plasmid 
DNA  were  then  used  for  pronuclear  injections  of  CD1  mouse  em¬ 
bryos  in  accordance  with  standard  protocols  approved  by  the 
University  of  Chicago. 

For  the  Tn7pS -lacZ  tagged  BACs,  multiple  stable  transgenic 
lines  were  generated  for  each  construct,  and  Fx  animals  were  ana¬ 
lyzed  for  each  line  at  multiple  postnatal  developmental  time 
points.  BAC  CTD-2506D10  DNA  injections  yielded  12  indepen¬ 
dent  lines  (0/12  positive  for  prostate  beta-galactosidase  expres¬ 
sion);  injections  of  RP11-124F15  and  CTD-2533C10  both  resulted 
in  two  independent  beta-galactosidase-expressing  lines. 

For  the  rs6983267-containing  enhancer  plasmid,  a  total  of 
three  beta-galactosidase-expressing  independent  transgenics  was 
obtained  for  rs6983267-G;  three  beta-galactosidase-expressing 
independent  transgenic  animals/lines  were  also  obtained  for 
rs6983267-T.  For  several  of  these  independent  lines,  the  F0  animals 
themselves  were  analyzed  at  P8;  this  excluded  any  analysis  of  the 
line  at  other  time  points.  For  the  risk  allele,  rs6983267-G,  we 
obtained  two  F0  animals  positive  for  beta-galactosidase  expression 
in  the  prostate.  The  third  independent  rs6983267-G  transgenic 
was  maintained  as  a  stable  line.  For  the  non-risk  allele,  rs6983267-T, 


one  F0  transgenic  animal  was  obtained;  the  remaining  two  in¬ 
dependent  transgenics  were  maintained  as  stable  lines. 

Mouse  in  vivo  transgenic  reporter  assay 

Prostates  and  mammary  glands  were  harvested  from  mice  at  P0,  P8, 
and  P21  and  dissected  into  cold  100  mM  phosphate  buffer  (PBS) 
(pH  7.3),  followed  by  30-45  min  of  incubation  with  4%  parafor¬ 
maldehyde  at  4°C.  E14.5  embryos  were  incubated  in  4%  parafor¬ 
maldehyde  for  2  h.  Tissues  were  then  washed  two  times  for  20  min 
with  wash  buffer  (2  mM  MgCl2;  0.01%  deoxycholate;  0.02%  NP-40; 
100  mM  phosphate  buffer  at  pH  7.3),  and  stained  for  18  h  at  room 
temperature  with  freshly  made  staining  solution  (0.8  mg/mL  X-gal; 
4  mM  potassium  ferrocyanide;  4  mM  potassium  ferricyanide;  20  mM 
Tris  at  pH  7.5  in  wash  buffer).  After  staining,  samples  were  rinsed  five 
times  for  20  min  in  PBS  and  post-fixed  in  4%  paraformaldehyde.  For 
each  animal  analyzed,  tail  samples  were  taken  at  the  time  of  dis¬ 
section  and  DNA  was  isolated  through  the  addition  of  lysis  buffer 
(100  mM  Tris-HCl  at  pH  8.5,  5  mM  EDTA,  0.2%  SDS,  200  mM  NaCl, 
and  1  mg/mL  proteinase  K)  and  incubation  overnight  at  55°C. 
Genotyping  was  performed  by  PCR  with  primers  within  the  re¬ 
porter  cassette/vector  (using  beta-globin  and  lacZ  primers  for  the 
Tn 7$-lacZ  tagged  BACs,  rs6983267-seq  primers  for  the  plasmids). 

Imaging 

All  photographs  were  taken  using  a  Leica  MZ1 6  F  stereomicroscope 
and  QCapture  Pro  software.  Settings  (lighting,  exposure  time)  were 
kept  constant  between  structure-  and  aged-matched  samples.  Im¬ 
ages  displayed  in  the  paper  were  generated  using  an  image  pro¬ 
cessing  software  package  (CombineZM)  that  allows  for  the  creation 
of  extended  depth  of  field  images.  Multiple  pictures  of  each 
structure  were  taken  at  varying  depth  of  fields  and  then  compu¬ 
tationally  integrated;  the  focus  areas  are  blended  to  create  a  com¬ 
posite  high-resolution  image  with  an  extended  depth  of  field.  This 
allowed  for  the  production  of  images  where  all  the  multiple  plains 
of  the  urogenital  apparatus  appear  well  focused  and  defined. 

In  situ  hybridization 

In  situ  hybridization  analysis  on  whole  P8  prostates  using  digox- 
igenin-labeled  Myc  antisense  and  sense  riboprobes  was  performed 
according  to  standard  protocols  (Wilkinson  and  Nieto  1993).  The 
probes  were  generated  from  a  full-length  mouse  Myc  cDNA  clone 
(IMAGE  ID  3962047).  Staining  was  performed  for  48  h,  and  the 
stained  prostates  were  then  transferred  to  10%  buffered  formalin 
phosphate  prior  to  imaging. 
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